Efficient Processing of Distributed Iceberg Semi-joins
نویسندگان
چکیده
The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where R and S reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server S with the pruning of unmatched tuples in server R. Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.
منابع مشابه
Processing Inequality Queries
Bernstein and Goodman showed that natural inequality ( NI) queries can be processed efficiently by semijoins, if there are no multiple inequality join edges, nor cycles with one or zero doublet. In this paper procedures to hand1 e these cases efficiently are given. Multiple inequality join edges can be processed by multi-attribute inequality semijoins. Two procedures based on generalized semi-j...
متن کاملAnalysis of Joins and Semi Joins in a Distributed Database Query
Database is defined as collection of files or table, where as DBMS stands for Database Management System which is collection of unified programs used to manage overall activities of the database. The two dominant approaches used for storing and managing database are centralized database management system and distributed database management system in which data is placed at central location and ...
متن کاملUsing Remote Joins for the Processing of Distributed Mobile Queries
The query processing in a mobile computing environment involves join processing among different sites which include static servers and mobile computers. In this paper, we first present some unique features of a mobile environment, and then, in light of these features, devise query processing methods for both join and query processing. Remote mobile joins are said to be effectual if they are, wh...
متن کاملEfficient Iceberg Query Processing in Sensor Networks
The iceberg query finds data whose aggregate values exceed a pre-specified threshold. To process an iceberg query in sensor networks, all sensor data have to be aggregated and then sensor data whose aggregate values are smaller than the threshold are eliminated. Whether a certain sensor datum is in the query result depends on the other sensor data values. Since sensor nodes are distributed, com...
متن کاملDistributed Query Processing in the Internet: Exploring Relation Replication and Network Characteristics
We introduce the concept of network graph for distributed query processing. Semijoins and joins are termed contributive replicated semijoins and contributive replicated joins, respectively, when they are interleaved into a join sequence to reduce the amount of data transmission cost required in a network with replicated relations. Our solution procedure consists of three consecutive steps, name...
متن کامل